SUITABILITY  OP  NON-RANDOM  DESIGNS  FOR  PACE  EVALUATION 


Final  Report 
September  6,  1990 


Prepared  by  the  University  of  Minnesota 
for  the  HCFA  Research  Center  sponsored  by 
the  University  of  Minnesota,  the  University  of  Pennsylvania, 
and  Mathematica  Policy  Research 
Cooperative  Agreement  No.  99-C-99169/5-02 


Roger  Feldman,   Project  Investigator 
Bryan  Dowd 
Michael  Finch 


The  statements  contained  in  this  report  are  solely  those  of  the 
authors  and  do  not  necessarily  reflect  the  views  or  policies  of  the 
Health  Care  Financing  Administration. 


I .  INTRODUCTION 


A.  Background 

HCFA  is  implementing  a  demonstration  project  to  test  the 
replicability  and  cost-effectiveness  of  the  Program  for  All- 
inclusive  Care  of  the  Elderly  (PACE)  .  This  demonstration  is 
designed  to  test  a  unique  model  of  capitated,  totally  integrated 
service  delivery  for  the  very  frail  community-dwelling  elderly. 
The  model  will  be  implemented  in  8  sites  nationally. 

Due  to  a  variety  of  reasons,  an  evaluation  design  in  which 
eligible  participants  are  randomly  assigned  to  treatment  and 
control  groups  is  not  feasible  for  the  PACE  replications.  Among 
the  reasons  for  not  using  randomization  are: 

•  Individuals  who  enroll  in  the  program  must  sever 
existing  provider  relations  and  agree  to  receive  care 
exclusively  from  the  PACE  site. 

•  A  site's  potential  market  is  constrained  both  by 
restrictive  eligibility  requirements  and  by  the 
limited  geographic  area  that  can  be  served  by  an  all- 
inclusive  center. 

•  Finally,  existing  providers  and/or  family  members  may 
not  be  willing  to  refer  eligible  individuals  to 
the  program. 

It  is  questionable  whether  centers  operating  under  these 
constraints  could  randomly  enroll  enough  individuals  to  become 
financially  viable,  or  to  provide  sufficient  sample  sizes  for  the 
evaluation.  However,  the  alternative  evaluation  design  -  using 
voluntarily  enrolled  participants  -  is  potentially  affected  by 
selectivity  bias.  That  is,  unobservable  variables  may  affect  both 
the  decision  to  enroll  in  PACE  and  the  behavioral  outcomes  of 
interest  (e.g.,  health  care  spending,  utilization  of  services,  and 
mortality) .  Selectivity  bias  poses  a  serious  threat  to  the 
internal  validity  of  the  evaluation. 

With  these  concerns  in  mind,  HCFA  has  proposed  to  use  a 
selectivity-corrected  model,  which  explicitly  models  both  the 
individual's  decision  to  enroll  in  PACE  and  the  behavioral  outcomes 
of  interest.  A  selection  variable  constructed  from  the  enrollment 
equation  will  be  used  to  control  for  unobservable  factors  related 
to  enrollment  and  behavioral  outcomes. 

B.  Description  of  Report 

This  report  consists  of  an  independent  assessment  of  two 
questions:     First,    is  there  any  reasonable  method  for  overcoming 


1 


the  threat  to  internal  validity  posed  by  selectivity  bias  in  the 
PACE  evaluation?  Second,  is  HCFA's  proposed  approach,  using  a 
statistical  selectivity  correction,  an  appropriate  method?  We 
believe  that  the  answer  to  the  first  question  is  "yes."  The  answer 
to  the  second  question  is  also  "yes,"  provided  that  careful 
attention  is  given  to  several  methodological  issues  that  are 
described  in  this  report. 


2 


C.     Study  Design/Methodology 


Statistical  corrections  for  selectivity  bias  have  become  an 
increasingly  popular  tool  for  program  evaluation.  Several  recent 
papers,  however,  claim  that  classical  experiments  are  necessary  for 
evaluating  federal  manpower  training  programs.  LaLonde  (1986), 
Fraker  and  Maynard  (1987),  and  LaLonde  and  Maynard  (1987)  had 
access  to  randomly  selected  groups  of  trainees.  Comparing 
experimental  results  with  those  from  non-experimental  estimators, 
they  found  that  the  non-experimental  results  vary  with  the  method 
used  and  differ  substantially  from  the  "true"  experimentally- 
estimated  effect. 

A  similar  critique  of  non-experimental  methods  has  relied  on 
simulation  analysis.  Manning,  Duan  and  Rogers  (1987)  simulated  the 
relative  performance  of  sample  selection  and  two-part  models  for 
data  with  a  cluster  at  zero.  The  data  were  drawn  from  a  bivariate 
normal  distribution  with  a  positive  correlation.  The  two-part 
models  were  no  worse  and  often  performed  appreciably  better  than 
selection  models  in  terms  of  mean  behavior,  provided  that  the 
"true"  selection  and  outcomes  eguations  each  contained  the  same 
regressor  variable  (each  equation  had  only  one  regressor) . 

Other  economists  have  responded  to  these  criticisms  with  two 
arguments.  First,  some  have  pointed  out  that  randomized 
experiments  are  not  necessarily  a  "gold  standard"  by  which  to  judge 
the  non-experimental  results.  Heckman  and  Hotz  (1987)  argue,  for 
example,  that  the  value  of  randomization  depends  on  the  parameters 
of  interest.  Suppose  that  one  is  interested  in  the  effect  of 
training  on  the  growth  rate  of  earnings.  Then,  even  random 
assignment  of  subjects  to  the  training  group  will  not  imply 
independence  of  last  period's  earnings  and  current  unobservable 
variables  for  these  subjects. 

These  authors  (Heckman  and  Hotz,  1987,  1988)  also  argue  that 
critics  of  selection  correction  models  have  failed  to  use  classical 
testing  procedures  to  select  among  the  alternative  possible 
estimators.  Thus,  the  failure  of  a  particular  estimator  to  remove 
selection  bias  does  not  mean  that  all  such  estimators  must  be 
rejected.  Heckman  and  Hotz  tested  alternative  selection  estimators 
in  order  to  select  the  best  approach  for  models  in  which  unobserved 
variables  affect  enrollment  in  the  experiment  and  pre-enrollment 
outcomes.  In  the  context  of  the  PACE  demonstrations,  PACE  enrollees 
may  differ  from  non-enrollees  in  ways  that  can  be  detected  by 
comparing  pre-enrollment  use  of  health  care  services. 

The  current  project  extends  Heckman  and  Hotz's  analysis  to 
include  possible  correlation  between  unobservable  variables  and 
post-enrollment  outcomes.  Thus,  it  will  consider  4  possible 
models: 

1.  Unobservable     variables     correlated     with  PACE 


3 


enrollment  affect  neither  pre  or  post-enrollment 
utilization.  There  is  no  selection  bias  in  this  model. 
However,  there  may  be  selection  based  on  observable 
variables  such  as  age,  sex,  and  functional  health  status. 

2.  Unobservable  variables  affect  pre-enrollment  outcomes 
only.  This  is  the  case  analyzed  by  Heckman  and  Hotz 
(1987)  . 

3.  Unobservable  variables  affect  post-enrollment 
outcomes  only.  For  example,  enrollees  in  a 
pharmaceutical  experiment  may  have  a  chemical  in  their 
blood  that  is  silent  until  they  receive  the  experimental 
drug.  The  chemical,  which  is  not  present  in  non- 
enrollees'  blood,  causes  an  unfavorable  reaction  to  the 
drug. 

4.  Unobservable  variables  affect  both  pre  and  post- 
enrollment  outcomes.  There  are  two  forms  of  selection 
bias  in  this  case. 

Procedures  exist  both  to  detect  the  presence  of  each  model 
versus  the  other  alternatives  and  to  make  the  right  correction  if 
that  model  is  present.  The  procedure  currently  favored  by  HCFA  - 
selectivity  correction  of  post-enrollment  outcomes  -  is  one 
possible  method  but  it  may  not  be  the  best  correction  in  all 
circumstances . 


II.     ARGUMENTS  AGAINST  RANDOMIZATION 

A.     Constraints  on  Successful  Randomization 

Randomized  research  designs,  if  executed  properly,  are 
superior  to  non-random  designs.  With  random  assignment,  observed 
differences  between  the  treatment  and  control  groups  can  be 
attributed  to  demonstration  impacts  with  a  known  degree  of 
statistical  confidence.  However,  it  does  not  follow  that  research 
should  be  abandoned  if  randomized  designs  are  precluded. 
Procedures  exist  for  testing  and  correcting  for  deficiencies  in 
non-random  comparison  groups,  although  the  burden  of  proof  rests 
uncomfortably  with  the  chooser  of  the  non-random  design. 

While  randomized  designs  are  increasingly  dominant  in  the 
social  sciences,  a  number  of  circumstances  place  constraints  on 
their  successful  application: 

•  There  may  be  political  or  ethical  constraints  on  the 
use  of  random  assignment  for  the  population  in 
guestion. 

•  The  introduction  of  random  assignment  may  sufficiently 


4 


alter   the   program   environment    so   as   to    question  the 
relevance  of  the  demonstration  being  tested. 

•  Within-site  control  groups  may  be  "contaminated"  by  the 
effects  of  the  experiment,  making  treatment-control 
comparisons  misleading. 

•  Subjects  may  voluntarily  disenroll  from  the  experimental 
group. 

These  constraints  assume  different  degrees  of  importance  in 
each  demonstration.  Concerning  the  first  issue,  there  are  broadly 
conceived  political  and  ethical  constraints  on  the  use  of  random 
assignment  in  the  PACE  demonstration.  Because  the  PACE  model  is  a 
totally  integrated  acute  and  long  term  care  service  delivery 
organization,  individuals  who  enroll  in  the  program 
must  severe  existing  provider  relations  and  agree  to  receive  care 
exclusively  from  the  PACE  site.  Thus,  although  the  PACE  program 
provides  all-inclusive  care,  it  may  be  perceived  by  some  as 
requiring  enrollees  to  give  up  an  important  freedom  -  freedom  to 
use  their  current  service  providers. 

Concerning  the  second  issue,  randomization  could  lead  to  a 
change  in  the  PACE  program  environment,  more  specifically,  to  a 
change  in  the  pool  of  eligible  applicants.  The  PACE  program  is 
available  only  to  elderly  persons  whose  frailty  is  severe  enough  to 
meet  state  certification  requirements  for  nursing  home  placement. 
PACE | s  applicant  pool  is  created  in  part  by  referrals  from 
providers  and  family  members.  These  individuals  may  be  hesitant  to 
refer  an  elderly  person  to  PACE  if  there  is  a  chance  that  he  or  she 
may  not  be  enrolled  even  after  being  determined  eligible. 

Another  potential  change  in  the  program  environment  relates 
to  the  advertising  and  marketing  efforts  of  the  PACE  organization. 
Receiving  randomly-assigned  enrollees  without  having  to  expend 
resources  to  attract  them  may  reduce  the  organization's  advertising 
and  marketing  efforts.  If  disenrollees  are  replaced  by  new 
entrants,  randomization  may  also  reduce  efforts  to  keep  enrollees 
from  leaving.  This  might  lead  to  subtle  (or  not  so  subtle) 
degradation  of  program  quality. 

A  final  program  environment  constraint  stems  from  the  fact 
that  the  program's  potential  market  is  limited,  both  by  restrictive 
eligibility  requirements  and  by  geography.  As  noted  above,  the 
PACE  program  is  available  only  to  very  frail  elderly  persons,  whom 
HCFA  estimates  to  be  5-7%  of  the  elderly  population  in  the 
community.  Furthermore,  enrollees  in  most  instances  are  required 
to  come  to  the  site  for  services,  so  the  geographic  area  for 
marketing  is  limited.  Thus,  the  total  pool  of  eligible  persons  may 
be  quite  small.  To  be  economically  viable,  the  program  may  have  to 
recruit  (for  example)  60%  of  this  pool.  The  program  will  fail  if 
less  than  60%  of  eligible  persons  are  randomly  enrolled.  This 
problem  can  be  corrected  by  increasing  the  randomization  percentage 


5 


to  60%.  However,  unless  the  evaluator  has  very  good  prior 
knowledge  of  the  market  size  and  the  minimum  economically  viable 
PACE  market  share,  his  or  her  choice  of  the  randomization 
percentage  may  place  the  PACE  organization  at  substantial  risk. 

Contamination  of  within-site  control  groups  is  probably  not 
likely  in  the  PACE  demonstration.2  Most  PACE  programs  are  located 
for  large  cities  where  their  success  or  failure  and  style  of 
operation  will  have  little  effect  on  the  overall  long  term  care 
delivery  system. 

Voluntary  disenrollment  has  been  of  limited  importance  at  On 
Lok,  where  few  participants  have  left  the  program.  However,  On  Lok 
has  been  in  existence  for  two  decades,  whereas  the  replication 
sites  are  inexperienced  at  accepting  capitation  payment  for  the 
complete  health  care  needs  of  a  frail  elderly  population.  It  is 
possible  that  more  enrollees  will  drop  out  in  a  replication  site, 
particularly  if  the  site's  performance  does  not  meet  expectations. 
If  this  happens,  then  randomization  at  enrollment  will  not  ensure 
that  randomization  applies  at  any  post-enrollment  time  period. 


B.     Cases  When  Randomization  Cannot  Be  Used 

In  addition  to  constraints  on  the  success  of  randomization, 
there  are  three  cases  in  which  randomization  cannot  be  used: 


6 


•  When  the  outcome  of  an  experiment  depends  on  how  many 
people  sign  up. 

•  When   the   rate   of   growth   in   an   outcome  variable  is 
important. 

When     the    program    will     be     implemented     only  on 
volunteers. 

The  outcome  of  an  many  social  experiments  depends  on  how  many 
people  "sign  up."  For  example,  the  effect  of  a  physician  preferred 
provider  organization  (PPO)  on  per  capita  use  of  health  care 
services  depends  on  how  many  people  enroll  in  the  PPO.  Since  the 
enrollment  proportion  cannot  be  determined  without  actually 
conducting  a  non-random  experiment,  it  is  obvious  that  this 
research  question  cannot  be  answered  by  randomization. 

Another  example,  much  more  subtle  than  the  first,  has  been 
suggested  by  Heckman  and  Hotz  (1987).  Suppose  that  the  experiment 
is  a  manpower  training  program  and  that  earnings  in  the  post- 
training  period  t  are  given  by: 

(!)     Yit  =  *ot  +  01tYi,t-1  +  *1Di  +  Ujt, 

where  Yit  is  the  earnings  of  person  i  in  period  t,  Df  is  a  dummy 
variable  representing  whether  person  i  received  training  or  not, 
and  Ujt  is  serially  correlated,  e.g.,  U,t  =  0.  +  uit,  where  <p.  is  a 
person-specific,  time-invariant  effect  and  ujt  is  an  independently 
and  identically  distributed  (iid)  disturbance.  A  properly 
conducted  random  experiment  makes  D(  independent  of  U.  ,  so  that 
ordinary  least  squares  estimates  of  a1  will  correctly  identify  the 
effect  of  training  on  the  level  of  earnings. 

Suppose,  however,  that  one  also  wants  to  examine  the  effect 
of  training  on  the  rate  of  growth  of  earnings.  To  examine  this 
question,  Heckman  and  Hotz  generalized  equation  (1) : 

(2)      Yit  -  *ot  +  *1t*l.t-1  +  alDi   +  a2DiYi,t-1  +  Uit' 

where  a2  indicates  the  effect  of  training  on  the  growth  rate  of 
earnings.  In  this  case,  even  if  one  conducts  a  randomized 
experiment,  the  independence  of  Df  with  respect  to  U.  does  not 
imply  independence  of  D,Y,  fm}  with  respect  to  Ujt,  'which  are 
connected  by  the  following  chain  of  correlations:  d'.Y.  and  U. 
are  correlated;  U,  M  and  Ujt  are  correlated;  therefore,  D-Yj  and 
U.t  are  correlated.'  Thus,  only  part  of  the  bias  that  would' 'affect 
the  estimation  of  (2)  can  be  solved  by  randomization.  In 
particular  the  effect  of  training  on  the  growth  rate  of  earnings 
cannot  be  determined  from  a  randomized  experiment.3 

This  seemingly  arcane  problem  may  be  of  practical  importance 
in  the  PACE  demonstration,    in  which  it  is  reasonable  to  believe 


7 


that  the  totally  integrated  PACE  model  may  be  more  effective  in 
managing  care  for  very  high-risk  elderly  among  the  eligible 
population.  Suppose  that  risk  is  measured  by  prior  utilization  of 
services  or  prior  expenditures.  The  PACE  effect  on  utilization  (or 
expenditure)  growth  cannot  be  determined  by  a  randomized 
experiment,  if  the  type  of  bias  suggested  by  Heckman  and  Hotz 
(1987)   is  present. 


8 


Many  social  experiments  will  be  implemented  only  on 
volunteers.  Randomization  of  subjects  in  a  pre-implementation 
experiment  is  conceptually  inappropriate  when  this  is  the  case.  An 
example  is  a  voluntary  corporate-sponsored  smoking  cessation 
program.  If  the  company  wishes  to  evaluate  this  program,  it  should 
not  randomly  assign  smokers  to  the  experimental  and  control  groups. 
The  company  might  ask  for  volunteers  and  randomly  assign  these 
volunteers  to  the  two  groups.  Although  randomization  of  volunteers 
is  conceptually  acceptable,  it  may  create  political  or  ethical 
problems.  We  can  imagine  that  refusing  to  accept  volunteers  for 
the  PACE  demonstration  would  be  ethically  objectionable. 

As  an  example  of  the  need  to  rely  on  voluntary  selection, 
consider  what  would  happen  if  the  PACE  capitation  rate  were 
determined  by  an  AAPCC  (adjusted  average  per  capita  cost)  or 
similar  method  that  relies  of  average  non-PACE  costs.  Suppose  that 
there  are  two  types  of  individuals  in  the  eligible  population:  Type 
X,  with  PACE  costs  of  $1,000  per  month  and  non-PACE  costs  of  $2,000 
per  month;  and  Type  Y,  with  PACE  and  non-PACE  costs  of  $2,000  and 
$4,000  per  month.  Suppose,  further,  that  these  types  each  comprise 
half  the  eligible  population.  A  properly  conducted  randomized 
experiment  would  select  experimental  and  control  groups  with  a  50% 
mix  of  each  type.  This  experiment  would  discover  that  the  average 
non-PACE  cost  is  $3,000  per  month,  and  it  would  also  discover  that 
the  PACE  program  effect  is  $1,500  per  month.  However,  it  would  not 
discover  the  AAPCC,  which  is  between  $2,000  and  $4,000  per  month, 
depending  on  whether  Type  X  or  Type  Y  people  are  attracted  to  PACE. 
Only  if  voluntary  enrollment  were  not  selective  would  a  randomized 
experiment  discover  the  AAPCC,  and  in  this  case  randomization  would 
be  unnecessary. 


9 


III.   VARIETIES  OF  SELECTION 


A.  Selection  Based  on  Observable  Variables 

It  would  be  unusual  if  enrollees  in  a  non-randomized 
experiment  did  not  differ  at  all  from  non-enrollees.  In  fact,  if 
this  were  the  case,  the  experimental  and  comparison  groups  would  be 
randomly  selected.  The  evaluator  could  determine  the  true 
experimental  effect  on  the  outcome  variable  by  using  a  simple  t- 
test . 

In  practice,  enrollees  in  a  non-random  experiment  will  differ 
from  non-enrollees  on  the  basis  of  observed  variables.  We  will 
refer  to  these  differences  as  "selection  on  observable  variables" 
or  "biased  selection,"  as  it  is  often  called  in  the  literature. 
For  example,  we  have  found  biased  selection  by  age  in  the  choice  of 
employment-based  health  insurance  plans  (Feldman,  Finch,  Dowd,  and 
Cassou,  1989) .  Older  employees  prefer  plans  that  offer  a  wide 
choice  of  physicians  and  hospitals,  compared  with  plans  that  limit 
this  choice. 

Now  let  us  suppose  that  one  wants  to  determine  the  true 
effect  of  health  plan  membership  on  the  use  of  health  care 
services.  To  the  extent  that  age  also  affects  the  use  of  services, 
age  must  be  controlled  in  the  utilization  equations.  Otherwise, 
part  of  the  effect  of  age  on  use  would  be  misinterpreted  as  a 
health  plan  effect. 

Generally  speaking,  there  are  two  ways  to  control  for 
observable  variables:  statistical  control  and  matching.  In  a 
statistically  controlled  experiment,  the  relevant  observable 
variables  are  measured  and  included  on  the  right-hand-side  of  the 
estimated  regression  equation.  Thus,  they  are  "held  constant," 
allowing  the  evaluator  to  estimate  the  true  experimental  effect. 

In  a  matched  (or  "matched  pairs")  experiment,  each 
experimental  subject  is  matched  with  a  control  subject  who  has 
similar  observed  characteristics.  This  approach  is  commonly  used 
in  epidemiological  studies  where  the  number  of  subjects  is  small 
and  there  is  a  need  for  experimental  control  over  independent 
variables.  Matched  pairs  designs  are  particularly  useful  if  the 
number  of  relevant  independent  variables  is  small  and  the  variables 
are  categorical,  e.g.,  sex  and  race.  In  such  cases,  it  is  fairly 
easy  to  find  a  "match"  for  each  experimental  subject.  Matched 
pairs  studies  are  less  useful  if  there  are  many  independent 
variables  which  cannot  easily  be  combined  into  discrete  categories, 
or  if  it  is  costly  to  screen  potential  control  subjects  for 
matching  with  experimental  subjects. 

B.  Selection  Based  on  Unobserved  Variables 


Suppose  that  participants  in  a  non-random  experiment  have  some 

10 


unmeasured  determinant  of  participation  that  is  correlated  with  an 
unmeasured  determinant  of  the  experimental  outcome.  Selection  of 
this  type  is  known  as  "selection  bias"  or  "selectivity  bias."  The 
central  result  of  the  selection  bias  literature  is  that  ordinary 
least  squares  (OLS)  estimates  of  the  experimental  effect  will  be 
biased  because  the  participation  variable  is  correlated  with  the 
error  term  in  the  outcome  equation. 


11 


We  will  use  the  following  model,  adapted  from  Moffitt  (1987), 
to  illustrate  the  selection  bias  problem: 

<3>     Yit  =    Mit  +  «D,  +  Ujt 

<4>     Yi.t-1  -  *t-l*f.t-i  +  Ui.t-1 

(5)  D*{  =  yZj  +  V, 

(6)  D,.  =  1  if  D*,.   >  0 
D.  =  0  if  D*,-  <  0. 

Equation  (3)  is  a  modified  version  of  the  outcome  equation  used  by 
Heckman  and  Hotz  (1987),  where  Xjt  are  observable  independent 
variables  that  affect  earnings  after  participation  in  the 
experiment,  and  the  constant  term  has  been  omitted  for  simplicity. 
Equation  (4)  determines  pre-experimental  earnings  in  terms  of 
independent  variables  observed  in  the  pre-experimental  period.  The 
"propensity  to  participate"  in  the  experiment  is  given  by  equation 
(5) ,  and  actual  participation  is  determined  by  equation  (6) . 

As  Moffitt  (1987)  notes,  any  variable  that  is  in  Z.  can 
easily  be  included  in  Xjt  if  it  is  thought  to  affect  outcomes 
directly.  Thus,  for  selection  bias  to  occur,  it  must  be  the  case 
that  individuals  and/or  the  program  operator  have  more  knowledge  of 
these  common  variables  than  the  analyst,  to  whom  they  appear  as 
components  of  the  error  terms. 

1.     Fixed-Effect  Selection  Bias 

Selection  bias  can  be  corrected  without  a  statistical  "fix," 
by  first-differencing  the  outcome  variables,  if  unobserved 
participation  and  outcome  variables  are  correlated  only  through  the 
fixed  effect  <p. .  Differencing  Yjt  and  Y.  ,  eliminates  the  fixed 
effect  and  therefore,  would  eliminate  the  correlation  between 
unobserved  participation  and  outcome  variables.  We  will  refer  to 
selection  bias  of  this  type  as  "fixed-effect  selection  bias." 

To  illustrate  fixed-effect  selection  bias  and  how  to  cure  it, 
suppose  that  participants  in  an  experiment  are  healthier  than  non- 
participants,  that  this  is  the  only  difference,  and  that  good 
health  is  a  fixed  effect  which  reduces  expenditures  (the  outcome 
variable)  by  $10  per  time  period.  The  following  matrix 
illustrates  fixed-effect  selection  bias: 


12 


Before  Experiment  After  Experiment 

(Time  t-1)  (Time  t) 


Participants 

100 

80 

Non-Participants 

110 

120 

13 


Numbers  in  each  cell  represent  mean  spending  by  each  group, 
before  and  after  the  experiment.  The  true  experimental  effect  can 
be  found  by  first-differencing  each  group's  mean  spending  and  then 
subtracting  the  control  group's  difference  from  the  experimental 
group's  difference:  experimental  effect  =  (80-100)  -  (120-110)  =  - 
30,  i.e.,  the  experiment  reduces  spending  by  $30.  Another  way  to 
interpret  this  result  is  that  $10  of  the  observed  $40  post- 
experiment  difference  between  participants  and  non-participants  is 
due  to  the  fixed  effect  of  good  health  in  the  experimental  group. 


In  terms  of  the  mathematical  model,  we  have: 

(7)  Yt|  (D=l)  =  0tXt  +  a  +  0 

(8)  Yt.J  (D=l)   =  fit.^  +  0 

(9)  Yj  (D=0)   =  0tXt 

(10)  Yt.J  (D=0)    =  fi^.y 

Variables  without  the  individual  subscript  i  are  expected  values. 
Thus,  eguat ions  (7)  and  (8)  represent  expected  experimental  group 
spending,  before  and  after  enrollment  in  the  experiment.  Eguations 
(9)  and  (10)  represent  expected  control  group  spending,  before  and 
after  the  experiment.     For  simplicity,  we 

have    assumed   that    the    observed    correlates    of    spending    (the  X 

variables)  do  not  differ  between  groups.    Subtracting  (8)  from  (7) 

yields  /3tXt  -  Pt.:Xt_,  +  a  =  -20;  subtracting  (10)  from  (9)  yields  /3tXt 

"  ^t-ixt-i  =  10 •  Tne  difference  of  these  differences  is  a  =  -20  -  10 
=  -30. 

Unfortunately,  there  is  ample  indirect  evidence  that  non- 
random  selection  of  subjects  into  experiments  involves  more  than  a 
fixed  effect.  Moffitt  (1987)  cites  the  finding,  from  manpower 
experiments,  that  earnings  profiles  of  trainees  are  notably 
different  than  those  of  non-trainees  prior  to  the  time  of 
training.4  In  addition,  even  if  pre-training  profiles  differ  by 
only  a  fixed  effect,  training  may  interact  with  the  fixed  effect  so 
that  it  does  not  completely  difference  away.5  Thus,  estimates  of 
a  that  rely  on  this  particular  longitudinal  structure  of  earnings 
are  ultimately  unreliable. 

2 .     A  General  Guide  to  Selection  Bias 

Fixed-effect  selection  bias  is  a  special  case  of  a  more 
general  model  in  which  unobserved  variables  affect  participation 
and  pre-experimental  outcomes.  In  fact,  as  we  outlined  earlier, 
there  are  three  general  types  of  selection  bias: 

•  Unobservable  variables  correlated  with  PACE  enrollment 
affect  pre-enrollment   outcomes   only    (type   1  selection 


14 


bias)  . 


•  Unobservable  variables  affect  post-enrollment  outcomes 
only  (type  2  selection  bias) . 

•  Unobservable  variables  affect  both  pre  and  post- 
enrollment  outcomes  (type  3  selection  bias) . 


15 


We  will  now  explain  in  more  detail  how  to  diagnose  and 
correct  each  type  of  selection  bias.  The  goal  is  to  arrive  at  a 
model  in  which  unobserved  variables  correlated  with  PACE  enrollment 
do  not  affect  outcomes  in  either  time  period. 

The  first  test  for  diagnosing  selection  bias  is  to  run  OLS 
eguations  on  pre-enrollment  outcomes.  If  this  initial  test  reveals 
that  Yv1|  (D=l)  =  Yj.J  (D=0)  ,  then  there  is  no  selection  bias  or  else 
there  is  selection  bias  of  type  2.  On  the  other  hand,  if  Y  |  (D=l) 
*  Yt.,|(D=0),  the  first  test  "fails"  and  it  follows  that  there  is 
selection  bias  of  type  1  or  3 .  Let  us  follow  up  both  branches  that 
result  from  this  initial  test,  starting  with  the  branch  where  the 
test  is  "passed." 

a.  First  Test  Is  Passed 

The  next  test  is  to  a  estimate  the  change- in-outcome 
equation,  including  a  selection  term  of  one's  choice.  If  this  term 
is  insignificant,  there  is  no  selection  bias  in  the  model; 
otherwise,  there  was  selection  bias  of  type  2.  The  selection  term 
has  removed  this  bias,  thus  reducing  the  model  to  the  no-selection 
case.  However,  if  one  repeats  this  test  with  different  selection 
terms,  one  may  find  that  some  of  these  terms  are  statistically 
insignificant  while  others  are  statistically  significant.  In  this 
case,  one  must  conclude  that  the  results  are  sensitive  to  the 
specification  of  the  participation  equation.  Different 
specifications  of  the  participation  equation  should  be  estimated, 
to  determine  if  the  selection  bias  and  outcome  results  are 
sensitive  to  the  specification  of  the  participation  equation.6 

b.  First  Test  Fails 

If  the  first  test  fails,  the  evaluator  knows  that  there  is 
selection  bias  of  type  1  or  3 .  In  this  case  the  evaluator  should 
estimate  a  selection-corrected  equation  for  pre-enrollment 
outcome.  This  correction  will  remove  pre-enrollment  selection 
bias  from  the  model.  It  is  possible  that  several  corrections 
perform  equally  well  in  removing  this  bias. 

The  only  remaining  potential  source  of  selection  bias  is  due 
to  correlation  between  unobservable  determinants  of  participation 
and  post-enrollment  outcomes.  The  investigator  should  therefore 
estimate  the  change-in-outcome  equation  with  a  selection  term.  If 
this  term  is  not  significant,  then  type  1  selection  bias  was 
present.  A  significant  selection  term  in  the  change-of -outcome 
equation  implies  that  type  3  selection  was  present.  However,  it 
has  been  removed  by  the  selection  term.  The  experimental  and 
comparison  groups  can  now  be  compared  as  if  there  were  no  selection 
bias  in  the  model. 

The  major  point  to  summarize  here  is  that  selection  bias  can 
be  romoved  from  the  model  by  using  an  appropriate  statistical 
correction.    After   this    has   been   done,    and    selection   based  on 


16 


observed  variables  has  been  accounted  for  (either  by  matching  or 
statistical  control)  ,  the  experimental  and  control  groups  resemble 
those  that  would  have  been  enrolled  in  a  random  experiment. 


17 


IV.     SELECTION  ISSUES  IN  THE  PACE  EVALUATION 

A.  Defining  the  Choices  and  the  Choosers 

The  population  eligible  for  PACE  will  be  screened  and  all 
will  be  certified  to  receive  long  term  care  services  in  nursing 
homes  or  the  community.  Those  who  choose  PACE  will  receive  a  set 
of  services  which  may  differ  by  site  and  by  person  but  will  include 
at  least  some  long  term  care  (LTC)  services  for  everyone.  Those 
who  do  not  choose  PACE  may  receive  nursing  home  care,  community- 
based  long  term  care,  or  no  long  term  care  services. 

Some  have  argued  that  the  last  group  —  non-enrollees 
receiving  no  LTC  services  —  should  be  excluded  from  the 
evaluation.7  Their  reasoning  is  that  all  PACE  enrollees  actually 
receive  LTC  services.  Including  people  from  the  "no  cost"  group  in 
the  comparison  sample  will  bias  the  evaluation  because  expenditures 
are  always  lower  for  people  who  don't  receive  services. 

We  disagree  with  this  argument.  The  goal  of  the  PACE 
evaluation  is  to  determine  whether  health  care  use  and  Medicare  and 
Medicaid  expenditures  differ  for  PACE  enrollees  and  a  comparison 
group  of  individuals  in  traditional  Medicare  and  Medicaid.  An 
additional  goal  of  the  evaluation  is  to  compare  health  status  and 
mortality  between  PACE  enrollees  and  participants  in  the 
traditional  system.  These  goals  define  the  choice  implicit  in  the 
PACE  evaluation:  did  the  eligible  person  choose  PACE  or  did  he/she 
choose  traditional  Medicaid  and/or  Medicare?8  PACE  enrollees 
receive  a  "PACE  style  of  care"  that  includes  different  types  of 
services.  Participants  in  the  traditional  system  receive  the 
"traditional  style  of  care"  that  may  include  nursing  homes  and 
community-based  LTC.  Or  it  may  include  no  Medicare  or  Medicaid 
reimbursed  services.  This  is  an  equally  legitimate  "input"  in  the 
traditional  system  as  are  nursing  homes  and  community-based  LTC. 

The  need  to  keep  non-users  in  the  comparison  group  is 
especially  clear  when  we  consider  the  health  outcomes  goal  of  PACE. 
It  is  hoped  that  PACE  maintains  the  health  of  its  enrollees  better 
than  the  traditional  system  maintains  the  health  of  a  comparable 
group  of  frail  elderly  people.  To  test  this  hypothesis,  it  is 
necessary  to  keep  the  frail  elderly  who  use  no  services  in  the 
comparison  group.  Removing  the  no-service  group  from  the 
comparison  sample  might  actually  reduce  average  outcomes  in  this 
sample.  This  would  occur  to  the  extent  that  non-users  of  services 
tend  to  be  healthier  than  users.  Such  a  comparison,  clearly,  would 
be  inappropriate. 

B.  Identification  of  the  Outcome  Equation 

Identification  of  the  outcome  equation  in  a  selection  bias 
model  requires  that  at  least  one  regressor  in  the  enrollment 
equation  is  excluded  from  the  outcome  equation.9     In  other  words, 


18 


we  want  the  enrollment  equation  to  "bring  something"  to  the  outcome 
equation  that  is  not  already  there.  HCFA  has  proposed  several 
variables  that  may  influence  the  PACE  enrollment  decision  but  not 
outcomes : 


19 


•  Premium  payment 

•  Previous  year's  out-of-pocket  spending 

•  Attachment  to  a  physician 

•  Perceived  health  status 

•  Attitude  toward  or  familiarity  with  managed  care  plans 

•  Attitude  toward  institutionalization 

•  Degree  of  "sociability." 

We  doubt  that  some  of  these  variables  are  exogenous  to 
outcomes.  In  particular,  previous  out-of-pocket  spending  is  likely 
to  affect  post-enrollment  spending.  HCFA's  argument  for  exogeneity 
is  that  a  person  who  has  faced  high  expenditures  in  the  previous 
year  should  be  more  likely  to  enroll  in  a  plan  where  all  services 
are  covered.  But  this  is  true  only  if  high  expenditures  in  one 
year  persist  to  at  least  some  degree  in  the  following  year. 
Otherwise,  individuals  would  shrug  off  this  year's  high 
expenditures  as  an  aberration  and  stay  in  their  present  plans. 
Thus,  transitory  components  of  spending  are  unrelated  to  both 
choice  and  current  spending  whereas  permanent  components  of 
spending  are  related  to  both  choice  and  current  spending.  HCFA's 
hoped-for  factor  —  transitory  spending  that  affects  health  plan 
choice  —  probably  doesn't  exist. 

Attachment  to  a  physician  is  directly  related  to  health  care 
spending.  People  are  attached  to  physicians  because  they 
chronically  spend  lots  of  money  on  health  care  services  obtained 
from  those  physicians.  Attitude  toward  institutionalization  and 
degree  of  sociability  may  also  affect  health  care  spending. 

HCFA's  argument  for  using  perceived  health  status  as  an 
identifying  variable  relies  on  the  distinction  between  perceived 
and  actual  health.  Controlling  for  actual  health  status,  which 
does  affect  health  care  spending,  HCFA  believes  that  perceived 
health  status  will  be  unrelated  to  spending.  The  opposite  effects 
are  believed  to  occur  in  the  PACE  enrollment  equation,  where  only 
perceived  health  is  believed  to  predict  enrollment.  Newhouse  et 
al.  (1989)  have  reported  that  subjective  health  measures,  including 
functional  status,  do  not  do  as  well  in  terms  of  predicting  health 
care  spending  as  dichotomous  physiologic  measures,  and  they  add 
little  to  the  predictive  power  of  continuous  physiologic  measures. 
Therefore,  the  first  part  of  HCFA's  argument  (perceived  health 
status  is  unrelated  to  spending  when  actual  health  status  is 
controlled)  seems  to  be  correct,  although  functional  status 
measures  may  be  more  important  in  an  impaired  population.  However, 
the  second  part  of  the  argument  (actual  health  status  is  unrelated 
to  PACE  enrollment  when  perceived  health  is  controlled)  is 
speculative.  We  are  not  aware  of  any  studies  comparing  the 
predictive  power  of  perceived  versus  actual  health  variables  in 
health  plan  enrollment  studies.  Consequently,  perceived  health 
status  is  a  possible  identifying  variable,  but  more  analysis  of  the 
choice  equation  needs  to  be  done  before  we  can  be  confident  in  this 
regard . 


20 


Attitude  toward  or  familiarity  with  managed  care  plans  may  be 
related  indirectly  to  outcomes.  People  who  have  been  enrolled  in 
HMOs  (either  through  their  job  or  in  Medicare-sponsored  HMOs)  are 
more  familiar  with  the  HMO  concept  and  are  more  likely  to  have 
favorable  opinions  of  HMOs  than  non-enrollees.  In  addition,  HMO 
enrollees  tend  to  have  lower-than-average  prior  utilization  of 
health  care  services.  For  example,  Brown  (1988)  found  evidence  of 
biased  selection  favoring  13  of  17  Medicare  Competition 
Demonstration  HMOs;  three  plans  experienced  neutral  selection;  and 
only  one  experienced  unfavorable  selection.  Brown's  finding 
implies  that  prior  use  of  services  must  be  included  in  the  outcome 
equations  to  control  for  biased  selection  (i.e.,  selection  based  on 
observable  variables) .  Having  controlled  for  this  indirect  link 
between  attitude  toward  HMOs  and  outcomes,  we  agree  that  attitude 
toward  HMOs  is  a  plausible  identifying  variable  in  the  outcome 
equation. 

We  have  used  premium  payment  to  identify  health  care 
utilization  equations  for  enrollees  in  employment-based  health 
insurance  plans  (Feldman  et  al.,  1989;  Dowd  et  al.,  forthcoming). 
Premiums  for  single-coverage  health  insurance  ranged  from  $0  to  $28 
per  month  in  our  study  (in  1984  dollars) .  PACE  enrollees  who  are 
not  dually-eligible  for  Medicare  and  Medicaid  have  to  pay  a  premium 
of  approximately  $1, 200-$l, 800  per  month,  equal  to  Medicaid's 
contribution  to  the  PACE  capitation  rate.  Therefore,  premium 
payment  would  seem  to  be  an  exceptionally  powerful  identifying 
variable  in  the  PACE  evaluation. 

In  practice,  the  effect  of  PACE  premium  on  enrollment  is 
uncertain.  It  is  possible  that  no  one  who  has  to  pay  the  PACE 
premium  will  join  PACE.  Premium  cannot  be  included  in  the  choice 
equation  if  it  discriminates  perfectly  between  enrollees  and  non- 
enrollees.10  Even  if  discrimination  was  not  perfect,  the  usefulness 
of  premium  as  an  identifying  variable  would  be  sharply  reduced.  On 
the  other  hand,  the  high  PACE  premium  may  not  deter  potential 
enrollees.  This  is  because  they  are  all  certified  to  receive  LTC 
and,  in  the  traditional  delivery  system,  they  would  have  to  pay 
heavily  out-of-pocket  for  some  of  these  services  (at  least  until 
they  spend  down  to  the  Medicaid  financial  eligibility  level) . 
Consequently,  the  effect  of  PACE  premium  on  enrollment  is  an 
empirical  issue. 

An  additional  point  to  consider  is  the  following:  persons  not 
dually-eligible  for  Medicare  and  Medicaid  will  probably  spend  less 
money  in  the  traditional  sector  than  those  who  are  eligible  for 
both  programs.  This  applies  both  to  Medicare/Medicaid  and  total 
reimbursement  (including  out-of-pocket  spending  and  private  Medigap 
insurance) .  Consequently,  the  "not-dually  eligible"  variable 
cannot  be  used  to  identify  non-enrollees'  outcome  equations. 


21 


The  foregoing  analysis  suggests  that  HCFA's  list  of 
identifying  variables  needs  further  refinement.  As  a  first  step  in 
this  direction,  we  have  specified  several  variables  to  add  to  their 
list: 

•  Distance  from  the  eligible  person's  residence  to  the 
PACE  site 

•  Variables  related  to  nursing  home  placement. 

Distance  from  the  eligible  person's  residence  to  the  PACE 
site  may  be  an  identifying  variable  for  the  non-enrollees '  outcome 
eguations.  Our  argument  is  that  eligible  people  who  live  near  the 
PACE  site  will  have  lower  travel  time,  other  things  egual,  and  they 
may  also  have  learned  about  the  program  by  word  of  mouth. 
Therefore,  they  will  be  more  likely  to  join  PACE.  Once  they  have 
joined,  distance  to  the  site  may  affect  enrollees'  use  of  services. 
But  it  will  not  affect  service  use  for  non-enrollees.  Therefore, 
it  can  be  used  to  identify  the  non-enrollees'  outcome  eguations. 
We  can  draw  the  analogy  to  deductibles  in  health  insurance 
choice/use  models:  assume  that  an  employee  chooses  between  a  fee- 
for-service  (FFS)  plan  with  a  deductible  and  an  HMO  without  a 
deductible.  The  FFS  deductible  affects  health  plan  choice  but  it 
does  not  affect  utilization  for  those  employees  who  choose  the  HMO. 


HCFA's  list  of  identifying  variables  seems  to  be  driven  by  an 
analogy  of  PACE  enrollment  and  the  HMO  enrollment  decision.  We 
suggest  that  joining  PACE  may  also  be  similar  (maybe  even  more  so) 
to  the  decision  to  enter  a  nursing  home.  PACE  enrollees  are  at 
high  risk  of  nursing  home  placement.  Without  this  program,  many  of 
them  will  enter  a  nursing  home  at  some  point.  PACE  may  be 
perceived  by  some  as  an  alternative  to  nursing  home  placement,  at 
least  in  the  near  term.  Conseguently ,  the  literature  on  nursing 
home  placement  may  be  a  source  for  identifying  variables  in  the 
PACE  evaluation. 

Careful  study  of  this  literature  will  be  needed  to  determine 
whether  relevant  variables  affect  only  PACE  choice  or  both  choice 
and  service  use  and,  if  so,  in  which  sector  the  use  effects  are 
observed.  For  example,  the  value  of  care  giver's  time  will 
presumably  be  higher,  other  things  egual,  if  the  care  giver  is 
working  than  if  he/she  is  not  working.  This  will  influence  the 
care  giver  to  recommend  the  choice  that  involves  less  of  his  or  her 
time.  In  the  traditional  system,  value  of  time  probably  also 
affects  service  use,  e.g.,  the  probability  of  nursing  home 
admission  may  be  higher  if  the  care  giver  is  working.  Will  the 
same  effect  on  service  use  be  found  in  PACE?  Possibly  not,  if  the 
PACE  "style  of  care"  is  invariant  to  family  background 
characteristics.  In  this  case,  "value  of  care  giver's  time"  could 
be  used  to  identify  the  PACE  utilization  eguations.  Similar 
consideration  should  be  given  to  other  family  background  variables 
(e.g.,   Does  the  elderly  person  own  his/her  own  home?     Does  he  or 


22 


she  live  at  home?) 


C.  Estimation  of  the  Selection  Model 
1.     Choice  of  Estimation  Method 

Econometric  models  of  selection  bias  can  be  estimated  by 
using  a  two-step  method  or  full  information  maximum  likelihood 
(FIML) .  The  two-step  method,  which  Manning,  Duan  and  Rogers  (1987) 
refer  to  as  limited  information  maximum  likelihood  (LIML) ,  usually 
consists  of  a  probit  sample  selection  equation  and  an  ordinary 
least  squares  (OLS)  "equation  of  interest"  containing  a  selection 
term  calculated  from  the  probit  equation.  HCFA's  choice  of 
estimation  method,  at  least  on  the  "first  pass,"  appears  to  be  the 
two-step  method.  The  two-step  estimation  method,  however,  is 
inferior  to  FIML  estimation. 

Little  (1985)  summarizes  some  of  the  problems  with  the  two- 
step  method.  In  some  empirical  studies  it  has  yielded  unstable 
results  (Lillard,  Smith  and  Welch,  1982).  It  is  less  efficient 
than  FIML  (Nelson,  1984).  The  two-step  estimator  is  sensitive  to 
violations  of  the  assumption  that  errors  in  the  choice  and  outcomes 
equations  are  linearly  related  (Lee,  1983).  Finally,  Manning,  Duan 
and  Rogers  (1987)  showed  that  the  two-step  estimator  performed 
poorly  when  the  outcome  equation  was  identified  only  by 
nonlinearity  of  the  selection  term.  Because  of  these  problems  with 
LIML,  we  recommend  that  FIML  estimation  be  considered  for  the 
evaluation  of  the  PACE  demonstration. 


23 


2 .     Outcome  Variable  Equals  Zero 


Another  problem  that  analysts  of  health  care  utilization  or 
expenditure  data  often  encounter  is  a  large  number  of  "zero"  values 
for  the  dependent  variable.  Since  subjects  in  the  PACE  evaluation 
are  the  frail  elderly,  total  expenditures  or  utilization  may  be 
positive  for  all  respondents,  but  there  may  be  a  significant 
proportion  of  zeros  in  any  particular  expenditure  or  utilization 
equation.  For  example,  many  PACE  enrollees  will  have  zero 
expenditures  if  the  outcome  variable  of  interest  is  nursing  home 
expenditures. 

There  are  two  approaches  to  the  problem  of  a  large  proportion 
of  zeros.  The  first  is  based  on  the  fact  that  the  expected  value 
of  the  dependent  variable,  Yt,  equals: 

(11)  Yt  =  Prob   (Yt  >  0)    *  Yt|(Yt  >  0). 

In  the  LIML  method,  the  first  term  on  the  right-hand  side  is 
estimated  with  a  probit  equation  and  the  second  term,  which  models 
the  level  of  expenditure  for  those  with  positive  expenditures,  is 
estimated  by  OLS.  Hay  and  Olsen  (1984)  and  Duan  et  al.  (1984) 
debated  the  possible  inclusion  of  a  sample  selection  term  in  the 
second  eguation. 

The  second  approach  to  the  problem  of  zeros  is  the  tobit  model. 
The  tobit  model  is  based  on  the  notion  that  underlying  both  zero 
and  positive  observations  of  expenditures  is  an  index  variable, 
denoted  Y*it,  such  that: 

(12)  Y*ft  =  0tX|t  +  aUjt,  where 

Yit  =  Y*it  if  Y*it  >  °'  that  is'   if  uit  >  -^txit/a 
Yjt  =  0  otherwise. 

The  tobit  model  can  be  estimated  using  a  two-step  or  FIML 
procedure.  Again,  the  FIML  procedure  appears  to  perform  better 
than  the  two-step  procedure.11 

Summary  of  Selection  Issues  in  PACE  Evaluation 


The  goal  of  the  PACE  evaluation  is  to  determine  whether 
health  care  use  and  Medicare  and  Medicaid  expenditures  differ  for 
PACE  enrollees  and  a  comparison  group  of  individuals  in  traditional 
Medicare  and  Medicaid.  An  additional  goal  of  the  evaluation  is  to 
compare  health  status  and  mortality  between  PACE  enrollees  and 
participants  in  the  traditional  system.  These  goals  define  the 
choices  implicit  in  the  PACE  evaluation:  PACE  versus  all  other 
styles  of  care  for  the  frail  elderly.  In  order  to  conduct  a 
successful  evaluation,  it  will  be  necessary  to  keep  those  people 
who  used  no  Medicare  or  Medicaid  services  in  the  comparison  group. 


24 


Identification  becomes  an  issue  whenever  a  statistical 
correction  for  selection  bias  is  used.  Identification  of  the 
outcome  equation  in  a  selection  bias  model  requires  that  at  least 
one  regressor  in  the  enrollment  equation  is  excluded  from  the 
outcome  equation.  HCFA  has  already  given  considerable  thought  to 
variables  that  might  identify  the  outcome  equations  in  the  PACE 
evaluation.  Each  potential  identifying  variable  should  be 
considered  carefully  to  determine  whether  it  should  be  excluded 
from  the  outcome  equations  of  the  experimental  and  comparison 
groups.  We  suggested  several  additional  variables  to  add  to  HCFA's 
list  of  potential  identifying  variables. 

Selection  models  can  be  estimated  either  by  two-step  (LIML) 
or  full  information  maximum  likelihood  methods  (FIML) .  Numerous 
comparisons  show  that  the  FIML  method  is  superior,  in  general,  to 
FIML.  Special  methods  can  be  used  whenever  the  outcome  variable  of 
interest  has  a  large  number  of  "zero"  values. 


25 


V.   NON-SELECTION  ISSUES  IN  THE  PACE  EVALUATION 

A«     Matching  Experimental  and  Comparison  Enrollees 

The  matched  pairs  approach  is  commonly  used  in 
epidemiological  studies  where  the  number  of  subjects  is  small  and 
there  is  a  need  for  experimental  control  over  independent 
variables.  Since  the  number  of  enrollees  in  the  PACE  replications 
will  be  small,  there  would  appear  to  be  an  argument  in  favor  of 
using  a  matched  pairs  design  for  this  evaluation.  Unfortunately, 
however,  the  potential  control  population  is  also  small.  Thus,  it 
could  be  difficult  to  find  a  match  for  each  enrollee.  In  addition, 
there  may  be  differences  among  sites  in  the  range  of  functional 
impairment  represented  by  PACE  participants.  At  a  minimum,  it 
would  be  necessary  to  match  enrollees  and  non-enrollees  by 
functional  status.  Collecting  information  of  this  type  will 
reguire  face-to-face  contact  and  therefore  it  would  be  costly  to 
screen  potential  control  subjects  for  matching.  These 
considerations  suggest  that  the  matched  pairs  method  would  not  be 
appropriate  for  the  PACE  evaluation. 

B.     Collecting  Data 

On  Lok  was  intended  to  serve  a  population  that  would 
otherwise  be  very  likely  to  enter  nursing  homes.  For  example,  the 
average  On  Lok  participant  has  five  medical  diagnoses.  Half  the 
participants  are  incontinent  and  60%  are  functionally  impaired. 
Presumably,  a  comparably  frail  elderly  population  will  be  targeted 
by  the  PACE  replication  sites.  Special  characteristics  of  the 
targeted  population  create  a  subseguent  need  to  maximize  the 
comparability  of  experimental  and  control  groups  at  each  site. 
Thus,  the  successful  PACE  evaluation  will  reguire  detailed 
information  on  medical  diagnoses,  functional  limitations,  and  other 
characteristics  of  potential  enrollees.  This  requirement  applies 
with  equal  force,  whether  or  not  control  and  experimental  subjects 
are  matched. 

Detailed  information  of  this  type  is  not  available  on  Medicare 
eligibility  tapes  and  may  not  be  obtainable  through  a  simple 
telephone  "screening"  interview.  Obtaining  information  on  ADLs, 
IADLs,  chronic  conditions  and  possibly  other  characteristics  of  an 
impaired  population  through  a  telephone  survey  is  difficult,  at 
best,  and  is  especially  problematic  in  this  case.  Many  of  the 
interviews  will  have  to  be  conducted  with  "collateral"  respondents 
due  to  the  very  frail  nature  of  this  population.  Our  experience 
with  collateral  interviews  (in  the  aged  and  disabled  Medicaid 
population)  argues  strongly  for  in-person  rather  than  telephone 
surveys.  We  believe  that  obtaining  this  type  of  data  will  reguire 
more  hands-on  data  collection  than  HCFA  has  planned.  This  will 
obviously  impact  on  the  cost  of  the  evaluation  and  may  have 
ramifications  for  other  features  (e.g.,  sample  sizes). 


26 


Even  the  cost  data  may  prove  a  challenge.  Data  for  fee-for- 
service  controls  can  come  from  Medicare  and  Medicaid  tapes,  but 
PACE  information  will  have  to  come  from  their  own  files,  because 
they  are  capitated.  HCFA  is  aware  of  the  difficulties  in 
collecting  cost  data  from  capitated  health  plans.  Fortunately, 
HCFA  has  some  control  over  the  sites  and  can  be  directive 
concerning  the  types  of  data  to  be  collected  and  the  methods  of 
data  collection. 

C.     Pooling  Data  From  Different  Sites 
1.     Statistical  Issues 

The  statistical  issues  involved  in  pooling  data  from 
different  sites  are  straightforward.  The  null  hypothesis  is  that 
coefficients  of  an  equation  estimated  with  pooled  data  do  not 
differ  from  coefficients  of  equations  estimated  for  each  site,  that 
is,  there  are  no  site-specific  effects.  The  alternative  hypothesis 
is  that  site-specific  effect  exist.  To  test  the  null  hypothesis, 
the  evaluator  calculates  the  following  statistic  (Chow,   1960) : 

(13)     F  =   (Qo  -  SO^/fm  -  Hk. 

SQ/fSn,.  -  mk) 

where  Q0  =  sum  of  squared  residuals  under  null  hypothesis 

Q;  =  sum  of  squared  residuals  for  the  ith  site  under  the 
alternative  hypothesis 

m  =  number  of  sites  (i.e.,  number  of  equations  estimated 
under  alternative  hypothesis) 

n,  =  number  of  observations  from  ith  site 

k  =  number  of  parameters  in  each  of  the  m  equations. 

The  null  hypothesis  is  rejected  if  F  is  greater  than  a  critical 
value  (determined  by  the  evaluator 's  tolerance  for  Type-I  error) 
with  degrees  of  freedom  =  (m-l)k,  Snj-mk.  This  test  is  similar  to 
a  multiple  partial  F-test  for  the  significance  of  a  group  of 
variables  and  in  fact,  it  can  be  modified  to  test  for  the 
significance  of  any  group  of  variables  in  the  outcomes  equations. 
One  looks  at  the  increase  in  the  residual  sum  of  squares  due  to 
each  restricted  parameter,  divided  by  the  residual  mean  square 
error  when  all  parameters  are  unrestricted. 
2 .     Conceptual  Issues 

The  simplicity  of  the  statistical  test  for  pooling  conceals 
deeper  conceptual  issues.  Under  a  separate  HCFA  Research  Center 
project,  investigators  from  the  University  of  Minnesota  have  shown 
that  the  PACE  replication  sites  differ  on  multiple  dimensions. 12 
Sponsorship,  state  support,  history  and  professional  culture  have 


27 


been  identified  as  variables  that  may  affect  performance  and  that 
differ  among  sites.  If  these  variables  can  be  quantified,  they  can 
be  included  in  the  PACE  outcomes  equations. 

Site-specific  variables  that  affect  outcome  levels  can  be 
included  as  main  effects  in  the  outcomes  equations.  However,  the 
number  of  site-specific  main  effects  that  can  be  analyzed  is 
limited  to  one  less  than  the  number  of  sites.  This  limitation 
includes  site  dummy  variables  as  well  as  quantifiable  differences 
among  sites.  Suppose,  for  example,  that  the  evaluator  includes 
dummy  variables  for  each  of  seven  sites,  compared  with  the  eighth 
site,  in  the  PACE  outcomes  equations.  This  uses  up  the  allowance 
of  site-specific  main  effects.  From  this  analysis,  the  evaluator 
may  conclude  that  the  sites  differ,  and  he/she  may  have  reasonable 
explanations  for  these  differences,  but  the  hypothesized  causes 
cannot  be  separated  from  all  other  unmeasured  site-specific 
variables  that  affect  outcome  levels. 

Site-specific  variables  that  affect  program  performance  can 
be  interacted  with  the  dummy  variable  representing  whether  an 
individual  enrolled  in  PACE.  However,  the  number  of  interactions 
between  site-specific  variables  and  the  PACE  enrollment  variable  is 
also  limited  to  seven.  Coefficients  representing  different  program 
effects  among  sites  may  therefore  be  confounded  by  unmeasured  site- 
specific  variables. 

These  remarks  may  discourage  the  evaluator  from  pooling  data. 
However,  the  alternative  method  —  estimating  separate  equations 
for  each  site  —  does  not  make  the  results  any  easier  to  interpret. 
Separate  equations  will  produce  a  unique  intercept  and  treatment 
coefficient  for  each  site.  In  fact,  if  the  coefficients  of  other 
variables  are  equal  across  sites,  these  unique  intercept  and 
treatment  coefficients  will  equal  the  main  effects  and  interactions 
in  pooled-data  regressions.  Therefore,  the  difficulty  of 
interpreting  results  applies  equally  to  pooled-data  analysis  and 
separate  equations.  The  fundamental  problem  is  that  there  are  only 
eight  observations  to  estimate  the  effects  of  site-specific 
variables.  There  may  be  more  than  seven  such  variables,  especially 
if  dummy  variables  for  each  site  are  included  in  the  regressions. 


Our  advice  to  HCFA  is  that  they  should  identify  several  site- 
specific  variables  that  can  be  measured  reliably  and  that  are 
hypothesized  to  have  a  strong  effect  on  program  performance.  These 
variables  should  be  included  in  the  PACE  outcomes  equations  as  main 
effects  and/or  interactions,  whichever  is  appropriate.  Hopefully, 
when  these  measured  variables  are  controlled,  the  remaining 
unmeasured  site-specific  effects  will  be  insignificant. 


28 


NOTES 


1.  Increasing  the  randomization  percentage  will  decrease  the  size 
of  the  control  group.  For  example,  if  the  eligible  population 
consists  of  500  frail  elderly  people,  increasing  the  randomization 
percentage  from  50%  to  60%  reduces  the  control  group  from  250  to 
200  people.  This  will  decrease  the  statistical  confidence 
regarding  tests  of  program  effectiveness.  However,  this  is  not  a 
defect  of  randomization.  There  are  only  500  eligible  people,  and 
whether  or  not  they  are  randomized,  the  experimental  group  must 
contain  300  of  them.  The  fundamental  problem,  in  other  words,  is 
the  small  pool  of  eligible  people. 

2.  Spillover  effects  cannot  be  dismissed  easily  in  other 
demonstrations.  For  example,  in  Medicare's  Physician  Preferred 
Provider  (PPO)  Demonstration,  spillover  of  physician  practice  style 
from  PPO  patients  to  non-PPO  patients  may  pose  a  serious  threat  to 
the  validity  of  within-site  randomization.  Spillover  effects  would 
equally  threaten  non-random  experiments.  A  multi-site  design  - 
where  some  sites  do  not  receive  the  experimental  treatment  -  is 
required  to  control  for  spillover  effects,  regardless  of  the  choice 
of  random  versus  non-random  enrollment.  See  Feldman  et  al.  (1988) 
for  a  discussion  of  the  design  and  implementation  issues  of  the 
Medicare  Physician  Preferred  Provider  Demonstration. 

3.  Note  that  the  coefficient  of  lagged  earnings,  as  well  as  the 
effect  of  training  on  earnings  growth,  would  be  biased  if  equation 
(2)  were  estimated  by  OLS. 

4.  Different  coefficients  in  the  pre-training  earnings  equations 
for  trainees  and  non-trainees  indicate  that  omitted  variables  are 
correlated  with  earnings  and  related  to  enrollment  in  the  training 
program. 

5.  Let  Yt|(D=l)  =  0tXt  +  a  +  a<p  +  0.  Then,  subtracting  (8)  from 
(7)  does  not  eliminate  a0  and  the  effect  of  training  for  a 
randomly-selected  enrollee  is  overestimated. 

6.  See  Jimenez  and  Kugler  (1987)  for  a  nice  application  of  these 
tests  for  robustness. 

7.  Thomas  E.  Brown,  from  the  South  Carolina  PACE  site,  made  this 
argument  in  a  personal  communication  to  Dr.  Feldman. 

8.  HCFA  seems  to  equate  "traditional"  Medicare  and  Medicaid  with 
fee-for-service  medicine.  For  this  evaluation,  however,  the 
traditional  Medicare  and  Medicaid  systems  also  include  HMOs.  This 
evaluation  is  testing  a  totally  integrated,  capitated  service 
delivery  system  versus  everything  else. 

9.  A  weaker  requirement  for  identification  relies  solely  on  the 
nonlinearity  of  the  probit  selection  equation.  However,  as  Maddala 
(1983,    p.    269)    points    out,    the    outcome   equation    is    likely  to 


29 


contain  nonlinear  terms  in  Xjt.  The  selection  variable  may  pick  up 
these  nonlinear  effects,  thus  indicating  the  presence  of  selection 
bias  when  none  exists.  To  the  extent  that  the  selection  term  also 
contains  variables  not  included  in  Xit,  it  is  more  nearly  orthogonal 
to  any  nonlinear  terms  in  the  outcome  eguation,  and  the  "false 
positive"  finding  of  selection  bias  is  less  likely. 


30 


10.  Every  variable  in  the  choice  equation  must  be  capable  of 
changing  the  probability  of  PACE  enrollment  if  that  variable 
changes  (i.e.,  the  derivatives  of  choice  with  respect  to  all 
variables  must  be  non-zero) .  However,  if  no  one  who  has  to  pay  to 
premium  ever  joins  PACE,  then  the  derivatives  of  all  variables 
except  premium  must  equal  zero  for  those  observations. 

11.  Special  methods  are  required  to  incorporate  both  the 
correction  for  sample  selectivity  and  the  correction  for  the  zero 
observations  into  the  model.  We  have  estimated  the  selectivity- 
corrected  tobit  model  (Dowd,  et  al.,  forthcominq)  and  are  currently 
workinq  on  refinements  to  this  model. 

12.  The  PACE  (On  Lok)  Case  Study  is  in  proqress  at  the  University 
of  Minnesota,  with  Dr.  Robert  Kane,  Project  Invest iqator. 
Important  similarities  and  differences  amonq  sites  are  discussed  in 
an  "Interim  Report  on  the  Qualitative  Analysis  of  the  Proqram  for 
All-inclusive  Care  for  the  Elderly  (PACE),11  July  30,  1990. 


31 


REFERENCES 


Brown,  Randall  S.,  "Biased  Selection  in  Medicare  HMOs,"  paper 
presented  at  the  Fifth  Annual  Meeting  of  the  Association  for  Health 
Services  Research,  San  Francisco,  CA,  June  26-28,  1988. 

Chow,  Gregory  C. ,  "Tests  of  Equality  Between  Sets  of  Coefficients 
in  Two  Linear  Equations,"  Econometrica.  28  (July,  1960),  pp.  591- 
605 . 

Duan,  Naihua,  Willard  G.  Manning,  Carl  N.  Morris  and  Joseph  P. 
Newhouse,  "Choosing  Between  the  Sample-selection  and  Multi-part 
Model,"  Journal  of  Business  and  Economic  Statistics.  2:3  (July, 
1984) ,   pp.  283-89. 

Dowd,  Bryan,  Roger  Feldman,  Steven  Cassou,  and  Michael  Finch, 
"Health  Plan  Choice  and  the  Use  of  Health  Care  Services,"  Review  of 
Economics  and  Statistics,  forthcoming. 

Feldman,  Roger,  Bryan  Dowd,  Michael  Finch  and  Steven  Cassou, 
Emplovment-Based  Health  Insurance.  National  Center  for  Health 
Services  Research  and  Health  Care  Technology  Assessment,  DHHS 
Publication  No.    (PHS)   89-3434,  June,  1989. 

Feldman,  Roger,  Bryan  Dowd,  Jon  Christianson,  Lyle  Nelson,  Charles 
Metcalf,  Nancy  Carlson,  and  Kathryn  Langwell,  "Design  and 
Implementation  of  a  Medicare  Preferred  Provider  Organization 
Demonstration,"  report  prepared  by  the  Minnesota  HCFA  Research 
Center,  October  3,  1988. 

Fraker,  Thomas  and  Rebecca  Maynard,  "The  Adequacy  of  Comparison 
Group  Designs  for  Evaluations  of  Employment-Related  Programs," 
Journal  of  Human  Resources.  22:2   (Spring,   1987),  pp.  194-227. 

Hay,  Joel  and  Randall  J.  01  sen,  "Let  Them  Eat  Cake:  A  Note  on 
Comparing  Alternative  Models  of  the  Demand  for  Medical  Care," 
Journal  of  Business  and  Economic  Statistics,  2:3  (July,  1984),  pp. 
279-82. 

Heckman,  James  J.  and  V.  Joseph  Hotz,  "Are  Classical  Experiments 
Necessary  for  Evaluating  the  Impact  of  Manpower  Training  Programs? 
A  Critical  Assessment,"  presented  at  the  American  Economic 
Association  annual  meeting,  Chicago,  111,  December,  1987. 

 ,  "Choosing  Among  Nonexperimental  Methods  for  Estimating  the 

Impact  of  Social  Programs:  The  Case  of  Manpower  Training,"  Journal 
of  the  American  Statistical  Association.  1988. 

Jimenez,  Emmanuel  and  Bernardo  Kugler,  "The  Earnings  Impact  of 
Training  Duration  in  a  Developing  Country:  An  Ordered  Probit 
Selection  Model  of  Columbia's  Servicio  Nacional  de  Aprendizaje 
(SENA),"  Journal  of  Human  Resources.  22:2  (Spring,   1987),  pp.  228- 


32 


47. 

LaLonde,  Robert  J.,  Evaluating  the  Econometric  Evaluation  of 
Training  Programs  With  Experimental  Data,"  American  Economic 
Review.   76:4   (September,   1986),  pp.  604-20. 


33 


  and  Rebecca  Maynard,    "How  Precise   Are   Evaluations  of 

Employment  and 

Training  Programs:  Evidence  from  a  Field  Experiment,"  Evaluation 
Studies.   11:4   (August,   1987),  pp.  428-51. 

Lee,  L-F.,  "Generalized  Econometric  Models  With  Selectivity," 
Econometrica .   51   (1983),  pp.  507-12. 

Lillard,  L.  ,  J. P.  Smith  and  F.  Welch,  "What  Do  We  Really  Know  About 
Wages:  The  Importance  of  Non-Reporting  and  Census  Imputation," 
Santa  Monica,  CA:  The  Rand  Corporation,  1982. 

Little,  Roderick  J. A.  "A  Note  About  Models  for  Selectivity  Bias," 
Econometrica  53:6   (November,   1985),  pp.  1469-74. 

Maddala,  G.S.,  Limited-dependent  and  Qualitative  Variables  in 
Econometrics .  Cambridge:  Cambridge  University  Press,  1983. 

Manning,  W.G.,  N.  Duan  and  W.H.  Rogers,  "Monte  Carlo  Simulation 
Evidence  on  the  Choice  Between  Sample  Selection  and  Two-Part 
Models,"  Journal  of  Econometrics.     35   (1987),  pp.  59-82. 

Moffitt,  Robert,  "Symposium  on  the  Econometric  Evaluations  of 
Manpower  Training  Programs,"  Journal  of  Human  Resources.  22:2 
(Spring,  1987),  pp.  149-56. 

Nelson,  F.D.,  "Efficiency  of  the  Two-Step  Estimator  for  Models  With 
Endogenous  Sample  Selection,"  Journal  of  Econometrics.  24  (1984), 
pp.  181-96. 

Newhouse,  Joseph  P.,  Willard  G.  Manning,  Emmett  B.  Keeler,  and 
Elizabeth  M.  Sloss,  "Adjusting  Capitation  Rates  Using  Objective 
Health  Measures  and  Prior  Utilization,"  Health  Care  Financing 
Review.   10:3   (Spring,   1989),  pp.  41-54. 


34 


APPENDIX 


In  this  appendix,  we  review  three  recent  studies  of 
selectivity  bias.  Two  of  these  studies  are  empirical  and  the  third 
uses  simulation  analysis.  They  all  have  important  implications  for 
the  proposed  PACE  evaluation. 

Thomas  Fraker  and  Rebecca  Maynard  (1987)  empirically 
investigated  the  limitations  of  using  non-experimental  designs  to 
evaluate  employment  and  training  programs.  They  analyzed  data  from 
the  National  Supported  Work  Demonstration  (NSWD) ,  a  structured  work 
experience  program  aimed  at  persons  with  severe  employment 
disabilities.  NSWD  used  a  randomized  design,  with  (1)  an 
experimental  group  that  received  the  "treatment"  and  (2)  a  control 
group  from  the  eligible  population.  Fraker  and  Maynard  also 
selected  (3)  a  non-random  "comparison  group"  from  the  Current 
Population  Survey  (CPS) ,  according  to  the  target  eligibility 
criteria  of  NSWD. 

They  estimated  several  statistical  models  to  determine  the 
impact  of  this  program.  Their  "basic  model"  held  measurable 
determinants  of  earnings  (prior  earnings  and  personal  and 
environmental  characteristics)  constant.  They  found  that  earnings 
grew  at  the  same  rate  for  groups  (2)  and  (3)  during  the  pre- 
enrollment  period.  But  during  enrollment  and  follow-up,  comparison 
group  earnings  grew  more  rapidly  than  control  group  earnings  for 
youth.  Had  they  relied  on  a  non-experimental  analysis  of  group  (1) 
versus  (3) ,  the  experiment  would  have  been  judged  a  failure.  The 
true  comparison  —  (1)  versus  (2)  —  revealed  a  successful 
experiment. 

They  also  estimated  a  "fixed  effect"  model  in  which  earnings 
growth  was  regressed  on  changes  in  personal  and  environment 
characteristics.  Finally,  they  used  a  special  case  of  the  fixed 
effect  model  where  the  impact  of  the  program  was  measured  by  the 
difference  between  the  mean  earnings  gain  for  experimental  and 
control  (or  comparison)  groups.  (This  method  was  explained  in 
Section  III.B.l.  of  our  report.)  Regardless  of  the  model,  the 
performance  of  the  comparison  group  method  for  youth  was  poor,  when 
measured  against  the  results  of  the  randomized  design.  Fraker  and 
Maynard  conclude  that  non-experimental  designs  cannot  be  relied  on 
to  estimate  the  effectiveness  of  employment  programs. 

Robert  LaLonde  (1986)  also  analyzed  data  from  the  NSWD 
experiment,  using  several  comparison  groups  drawn  from  the  CPS. 
Unlike  Fraker  and  Maynard,  however,  he  selected  only  those 
comparison  groups  that  passed  a  specification  test  for  the 
similarity  of  pre-training  earnings  coefficients  (see  page  9  and 
notes  4  and  5  of  our  report).  LaLonde  found  that  such 
specification  tests  are  appealing,  in  that  they  can  reject  certain 
comparison  groups  as  inappropriate.  Nevertheless,  these  tests  are 
not    sufficient    because,    even    when    the    test    was    passed,  the 


35 


comparison  group  method  performed  poorly. 


LaLonde  estimated  a  LIML  selectivity  model  on  the 
experimental  and  control  subjects.  As  expected,  the  estimated 
enrollment  coefficients  were  insignificant.  In  fact,  this  is  a 
test  of  successful  randomization.  Estimates  of  the  training  effect 
obtained  from  this  LIML  selection  model  were  nearly  identical  to 
the  experimental  results. 


36 


Finally,  he  estimated  the  LIML  selectivity  model  on 
experimental  and  comparison  group  subjects.  His  reported  results 
are  generally  closer  than  one-step  (i.e.,  simple  regression) 
methods  to  the  true  experimental  results.  The  selection-corrected 
results  are  quite  close  for  some  comparison  groups  but  they  are 
sensitive  to  the  sometimes  arbitrary  assumptions  used  to  identify 
the  earnings  equations. 

Manning,  Duan,  and  Rogers  (1987)  used  simulation  analysis  to 
examine  the  relative  performance  of  sample  selection  and  two-part 
models  for  data  with  a  cluster  at  zero.  They  created  two  simulated 
data  sets:  one  in  which  there  were  no  exclusion  restrictions  to 
identify  the  outcome  equation,  and  another  in  which  the  outcome 
equation  was  identified.  In  the  first  case,  two-part  models 
performed  no  worse,  and  often  appreciably  better  than  LIML  and  FIML 
sample  selection  models  in  terms  of  mean  behavior.  LIML  sample 
selection  models  had  the  worst  performance.  However,  in  the  second 
case,  LIML  sample  selection  performed  better  than  two-part  models. 
In  fact,  it  performed  so  well  in  terms  of  point-wise  bias  that  the 
authors  did  not  examine  the  FIML  selection  model  for  this  case. 

Our  review  of  these  articles  points  toward  several 
conclusions.  First,  simple  comparison  group  methods  may  not 
perform  well  in  non-random  experiments.  Second,  specification 
tests  are  a  useful  way  to  eliminate  some  unsuitable  comparison 
groups,  but  they  are  not  sufficient.  Thus,  our  third  conclusion  is 
that  a  statistical  selection  correction  is  clearly  required  in  non- 
random  experiments.  LaLonde's  empirical  study  and  the  simulation 
analysis  by  Manning  et  al.  indicate  the  fourth  and  final 
conclusion:  care  must  taken  in  identifying  the  outcome  equation  in 
selection  models.  Provided  that  this  is  done,  the  empirical  and 
theoretical  studies  both  demonstrate  that  sample  selection  models 
can  remove  bias  from  non-experimental  data.  HCFA  should  keep  these 
conclusions  in  mind  when  conducting  the  PACE  evaluation. 


hcfa\pace. rep 


CnS  LIBRARY 


3  RU^S  00DL40B4  4 


